Efficient query-driven biclustering of gene expression data using Probabilistic Relational Models
نویسندگان
چکیده
Biclustering is an increasingly popular technique to identify gene regulatory modules that are linked to biological processes. We describe a novel method, called ProBic, that was developed within the framework of Probabilistic Relational Models (PRMs). ProBic is an efficient biclustering algorithm that simultaneously identifies a set of potentially overlapping biclusters in a gene expression dataset and which can be used both in a query-driven and a global setting. The model naturally deals with missing values. Robust sets of biclusters are obtained due to the explicit modeling of noise. The maximum likelihood solution is approximated using an Expectation-Maximization strategy. ProBic was applied to various synthetic gene expression datasets and the results for synthetic data confirmed that ProBic can successfully identify biclusters under various noise levels, overlap and missing values in both the query-driven and global setting. Additional expert knowledge can be introduced through a number of prior distribution parameters. Default settings were shown to be applicable for a wide range of different datasets. Our results on synthetic data show that PRMs can be used to identify overlapping biclusters in an efficient and robust manner. keywords: biclustering, probabilistic relational model, gene expression, regulatory module, expectation-maximization. ∗corresponding author
منابع مشابه
Comparison of Biological Significance of Biclusters of SIMBIC and SIMBIC+ Biclustering Models
Query driven Biclustering Model refers to the problem of extracting biclusters based on a query gene or query condition. The extracted biclusters consist of a set of genes and a subset of conditions that are similar to the query gene or query condition and it includes the query input also. Two approaches applied for biclustering problems are topdown and bottom-up, based on how they tackle the p...
متن کاملBayesStore: managing large, uncertain data repositories with probabilistic graphical models
Several real-world applications need to effectively manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings for a variety of reasons, including motion prediction and human behavior modeling. Such probabilistic data analyses require sophisticated machine-learning too...
متن کاملGraphical models for biclustering and information retrieval in gene expression data
Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author José Caldas Name of the doctoral dissertation Graphical Models for Biclustering and Information Retrieval in Gene Expression Data Publisher School of Science Unit Department of Information and Computer Science Series Aalto University publication series DOCTORAL DISSERTATIONS 33/2012 Field of research Bioinformatics Manuscript ...
متن کاملEfficient Mining Differential Co-Expression Constant Row Bicluster in Real-Valued Gene Expression Datasets
Biclustering aims to mine a number of co-expressed genes under a set of experimental conditions in gene expression dataset. Recently, differential co-expression biclustering approach has been used to identify class-specific biclusters between two gene expression datasets. However, it cannot handle differential co-expression constant row biclusters efficiently in real-valued datasets. In this pa...
متن کاملA Trust Based Probabilistic Method for Efficient Correctness Verification in Database Outsourcing
Correctness verification of query results is a significant challenge in database outsourcing. Most of the proposed approaches impose high overhead, which makes them impractical in real scenarios. Probabilistic approaches are proposed in order to reduce the computation overhead pertaining to the verification process. In this paper, we use the notion of trust as the basis of our probabilistic app...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008